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We use martingales to study Bayesian consistency. We derive 
sufficient conditions for both Bellinger and Kullback-Leibler consis¬ 
tency, which do not rely on the use of a sieve. Alternative sufficient 
conditions for Bellinger consistency are also found and demonstrated 
on examples. 


1. Introduction. Let Xi,X 2 , ■ ■ ■, taking values in be indepen¬ 

dent and identically distributed random variables from some fixed but un¬ 
known (the true) density function /o, with corresponding distribution func¬ 
tion Fq. Let Fq be the corresponding n-fold product measure on 
and let F^ denote the infinite product measure. 

With /o being unknown, the Bayesian constructs a prior distribution II 
on n, the space of density functions on This prior combines with 

the data to define the posterior distribution n”, assigning mass 


(1) n-(A) 

to the set of densities A, where 


jRnifMdf) 


Rn{f) = l[f{X^)/fo{X,). 

i=l 

The predictive density is given by 

fn{x) = J f{x)n^{df). 

Here, and throughout, we assume that all relevant / are, in fact, densities 
with respect to the Lebesgue measure. 
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This paper is concerned with Hellinger and Kullback-Leibler consistency. 
For example, for Hellinger consistency, the required result we are aiming for 
is 


U^{{f:H{fJo)>e})^0 a.s. 


for all e > 0, where H{f,fo) is the Hellinger distance between / and /o, 
given by 



Previous studies of Hellinger consistency [see, e.g., Barron, Schervish and 
Wasserman (1999) and Ghosal, Ghosh and Ramamoorthi (1999)] deal with 
the numerator and denominator in the expression for n"'(H) separately. 
Briefly, if H puts positive mass on all Kullback-Leibler neighborhoods of /o 
(which will be referred to as the Kullback-Leibler property for H), then 
the denominator is eventually bounded below by exp(—nc) for all c > 0. 
Setting A = {f: H{f, Jq) > e}, for some e > 0, with constraints on the prior, 
ensuring sufficiently low mass on densities which track the data too closely, 
the numerator can be eventually bounded above by exp(—n6), for some 6 > 
0. Consequently, with the appropriate conditions in place, n"(H) —> 0 a.s., 
with exponential rate, for all e > 0. 

To be more explicit, the basic ideas of current approaches are based on 
the introduction of an increasing sequence of sets '^n, a sieve, and to consider 


n"(H) = n"(^„ n H) + n”(^„^ n A). 


Putting sufficiently low mass on densities which track the data too closely, 
that is, the densities in involves ensuring that n(?f^) < exp(—n^) for all 
large n and for some ^ > 0. This results in n"(5^^) < exp(—n^*) a.s. for all 
large n for some > 0. The aim then is to find such that 



for all large n for some <5 > 0. Approaches differ in the precise form of 
which guarantees the above. For example, Ghosal, Ghosh and Ramamoorthi 
(1999) have J{r], ?f„) < nfSrj for all large n, for some > 0 for all r] > 0, where 
J is the Li metric entropy. 

We also deal with the numerator and denominator separately but study 
the numerator via different techniques which include the use of martingales. 
We do not use sieves. To fix the notation, define 


fnA{x) = I fix)n^Adf) 


to be the predictive density with posterior distribution restricted, and nor¬ 
malized, to the set A, let h{f,fo) = 1 — f VZ/o = be a slight 
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variation on the Hellinger distance H, and note that h{f, /g) < 1. Also define 
In = /i?n(/)n(d/) and D{f,fo) = //olog(/o//) to be the Kullback-Leibler 
divergence between / and /g. The Kullback-Leibler property is given by 

for all e > 0. Since /g is unknown, the condition is 

U{{f:D{f,g)<e})>0 

for all densities g and all e > 0. This is possible to achieve using nonparamet- 
ric priors. See Barron, Schervish and Wasserman (1999) and Ghosal, Ghosh 
and Ramamoorthi (1999). 

The layout of the paper is as follows. In Section 2 we present preliminary 
results based on certain martingales. Section 3 unifies approaches to poste¬ 
rior consistency via the use of these martingales and Section 4 deals with 
the special case of consistency for predictive densities. Section 5 presents 
a specific result for Hellinger consistency which does not use martingales 
and examples are presented in Section 6. Section 7 contains a discussion 
and highlights areas for future research. 

2. Preliminaries. Here we will discuss fundamental concepts and ideas 
on which the paper is based. Our concern is with the numerator = 

Rn(/)n((i/), where A is a set of densities, of (1). We have already es¬ 
tablished that the Kullback-Leibler property will always deal appropriately 
with the denominator. The following identity is the key; 

(2) Ln+l/Ln = fnAiXn+l)/fo{Xn+l), n = 0, 1, . . . , 

and it is easy to check that this holds. From here we can go in one of two di¬ 
rections. The first option is based on martingales and takes A = {/: d{f, /g) > e}, 
where d metricizes weak convergence, and is the Hellinger or the Kullback- 
Leibler distance. The second option, in the case of the Hellinger distance, 
is to split {f '■ H{f, fo) > e} into a countable number of disjoint sets {Aj} 
based on Hellinger balls, Aj = {f: H{f, fj) < 5} for some suitable set of den¬ 
sities {fj} and some <5 > 0. This is possible due to the separability of II with 
respect to the Hellinger metric. 

The two approaches share similarities, both use (2), but are otherwise 
different. The first covers a range of types of consistency, whereas the second 
seems suited only to Hellinger consistency. To set the scene for the first 
option we consider measurable functions T^, linked to a distance measure d, 
such that 

EiTdiK+l/LnWn} = -difnAjo), 

where = cr{Xi ,..., A„). If Td{y) = y/y - I, then d(/,/g) = h{f,fo) and 
if Td{y) = logy, then d(/,/g) = D{f,fo). Other cases arise; for example, if 
Td{y) = 1 — 1/y, then d{f, /g) = //g// — 1, which is the x-squared distance. 
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Now consider the martingale (Mjv,=^Ar) given by 
N 

MN='£{Td{Ln/Ln-l)+d{fn-lAjo)}. 

n=l 

A well-known result for such martingales [see Loeve (1963)] is that if 
(3) ^ n~^ Yar{Td{Ln/Ln-i)} < oo, 

n 

then Mn/N —> 0 a.s. Consequently, if 

liminid{fnA, fo) > 0 a.s., 

n 

then 

1 ^ 

limsup —V rrf(L„/L„_i) < 0 a.s. 

N —1 

n=l 

For both cases of Td{y) = ^yy — 1 and T^iy) = logy, the above implies that 
there exists a <5 > 0 such that Ltv < exp(—A^(5) a.s. for some 5 > 0 for all 
large N. This result can be achieved for Td{y) = ^/y — 1 by making use of the 
fact that an arithmetic mean is greater than or equal to a geometric mean, 
and it is clearly true for Td{y) = logy. It is worth writing this down formally. 


Lemma 1. Let Ln = J^RniD'^^idf) andTd{y) = orTd{y) = logy. 

If (3) holds and 

liminfd(/„A,/o) > 0 a.s., 

n 

then L]\f < exp(—N5) a.s. for some 5 > 0 for all large N. 

This result, namely, Ljv < exp(—N5) a.s. for some 5 > 0 for all large N, 
combined with the Kullback-Leibler property for 11, leads to n"'(^) —> 0 a.s. 
This follows since the Kullback-Leibler property implies > exp{—Nc) a.s. 
for all large N, for any c > 0. Hence, we can choose c< 5. 

3. Posterior consistency. In this section we unify posterior consistency 
based on Lemma I. Here we will drop the subscript d from T. 

3.1. Weak consistency. Here we have A = {f : dwif, fo) > e}, where dw 
metricizes weak convergence of probability distributions, that is, dwifn, fo) —> 

0 if and only if / g{x)fn{x) dx ^ f g{x)fo{x) dx for all continuous and bounded g. 
Now H{fnA, fo) > £* for all large n a.s. for some e* > 0 since eventually fnA 
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does not lie in a weak neighborhood of /o and so neither does it lie in a 
Hellinger neighborhood of /o- Hence, taking T{y) = ^/y — 1, we have 

(4) ^n-2Var{r(L„/L„_i)} < cx) 

n 

automatically as E(L„/L„_i) = 1. Hence, both conditions of Lemma 1 are 
satisfied and so the Kullback-Leibler property is sufficient for weak consis¬ 
tency. This is, of course, known; see Schwartz (1965). 

3.2. Hellinger consistency. Here we retain T{y) = ^/y — 1 and consider 
A = {f: H{f, fo) > e}. While, as in Section 3.1, we remain with (4) being 
true, we do not automatically have liminfn H{fnA, fo) > 0 a.s. Hence, we 
have only one of the conditions of Lemma 1 being satisfied automatically. 

Theorem 1. IfU has the Kullback-Leibler property, then 

a.s. 

for all sets A for which liminf„ /o) > 0 a.s. 

This extends Walker (2003) who showed that if H has the Kullback- 
Leibler property and H{fnA,fo) > 7 for all n a.s. for some 7 > 0, then 
n"'(H) —> 0 a.s. This result was then used to obtain the Hellinger consis¬ 
tency result of Ghosal, Ghosh and Ramamoorthi (1999). 

3.3. Kullback-Leibler consistency. In view of the importance of the Kullback- 
Leibler property to Bayesian consistency, it would make sense to hnd ad¬ 
ditional sufficient conditions for posteriors to accumulate in all Kullback- 
Leibler neighborhoods of /q. There are also practical reasons. A Bayesian 
approach to parametric prediction advocated by Walker and Gutierrez-Peha 
(1999) entails minimizing D{fn, fx). Here fx is a parametric family of densi¬ 
ties and fn is a nonparametric predictive density. For large sample suitability 

of this procedure it is important that D{fn,fo) 0 a.s. Further motivation 
for Kullback-Leibler consistency is given in Barron (1988) who cites uni¬ 
versal data compression and stock market portfolio selection as applications 
where this type of consistency is important. 

For the martingale M^r we now take T(y) = logy and consider A = 

{f: D{f, fo) > e}. In this case neither of the conditions of Lemma 1 holds 
automatically. 

Theorem 2. IfH has the Kullback-Leibler property and 

( 5 ) Var{log(L„/Ln-i)} < 00, 

n 

then n”(A) —> 0 a.s. for all sets A for which limmin D{fnA, fo) > 0 a.s. 
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To examine Theorem 2 further, we write giving 

N 

Mm = loglw + log{n'^(A)/n(^)} + ^ D{fn-IA. /o). 

n=l 

If n has the Kullback-Leibler property, then N~^ log/w —> 0 a.s. This follows 
since In > exp(—nc) a.s. for all large n for any c > 0 and, because E(/„) = 1, 
we have In < exp(nc) a.s. for all large n for any c > 0. So, Mm/N —> 0 a.s. 
and 11'^(^) < exp(—A^c) a.s. for some c > 0 for all large N together imply 
that 

1 ^ 

(6) liimnf — ^ D{fn-iA, /o) > c a.s. 

n=l 

Hence, Theorem 2 could be written as follows: 

Theorem 2*. IfH has the Kullback-Leibler property and (5) holds, then 
n"'(H) < exp(—nc) a.s. for all large n if and only if (6) holds. 

If H = {/ : D{f, /o) > e}, then one anticipates that liminf„ D{fnA, fo) > e 
a.s. However, it is difficult to establish when liminf m J2n=i D{fn-iA, /o)>0 
a.s., yet, when H has the Kullback-Leibler property and (5) holds, which is 
not a particularly demanding condition, it does become a necessary condi¬ 
tion for Kullback-Leibler consistency with exponential rate. It should also 
be pointed out that Theorem 2* equally applies to Hellinger consistency 
when A = {f : H{f, fo) > e} and the necessary condition also applies. 

4. Predictive consistency. Here we take H H so that fnA = fn, tbe 
predictive density. Also, = In, the denominator of (I). Hence, 

N 

Mm = Y.{T{In/In-l) + d{fn-l,fo)}- 

n=l 

Lemma 2. IfU has the Kullback-Leibler property and T{y) = ^/y — 1 or 
T{y) = \ogy, then 

1 ^ 

— ^r(4/4-i)^0 a.s. 

n=l 

Proof. This is obvious with T{y) = logy since N~^\oglM —> 0 a.s. as 
—> oo when H has the Kullback-Leibler property. When T{y) = yfy — 1, 
we know that Mm/N —> 0 a.s. and so 
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Now 

^ ^ T{In/In-l) > - 1 - 0 a.S. 

n=l 

and so 

1 ^ 

-^r(4/4_i)^0 a.s. 

n=l 

as h>0, completing the proof. □ 

If we take T{y) = ^Jy — 1, then /N —> 0 a.s. and, from Lemma 2, 
we have 

1 ^ 

a.s. 

n=l 

This result is found in Walker (2003). The following theorem applies by 
considering T{y) = logy. 


Theorem 3. IfH has the Kullback-Leibler property and 
m T. n ^ Var{log(/n/4-i)} < oo, 


then 


N 


-Y^D{U-i,fo)^0 


a.s. 


71=1 


It is straightforward to demonstrate that (7) holds when 
sup|ex" J /o//n|<00. 

Here E^n is the expectation with respect to X” = [Xi,... ,Xn) taken 
independently from /q. See Section 6.4 for an example illustrating a non- 
parametric prior for which snp„{Ex" / fo/fn} < oo. 

5. Hellinger consistency. To introduce the ideas here, consider the dis¬ 
crete prior which puts mass H^ on the density function /fc, for k = 1,2,.... 
In this case the posterior mass assigned to fk is given by 




Rn{fk)^k 
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If we assume the prior has the Kullback-Leibler property, then the additional 
condition for Hellinger consistency turns out to be 

Ev^ < oo. 
k 


Remark 1. The result provides information concerning the counterex¬ 
ample appearing in Barron, Schervish and Wasserman (1999), which shows 
that the Kullback-Leibler property for 11 is not sufficient for Hellinger con¬ 
sistency. The prior in this case puts positive mass on single densities and, 
for each integer N, has sets of these densities for which n(^ 7 v) > 
for some r/ > 0. Clearly, then 

E\/n(=^7v) =oo. 

N 

Now H is separable; that is, we can cover H with a countable set of Hellinger 
balls of radius 5 for any <5 > 0. Therefore, 

A = {f:h{f,fo)>e} 

can be covered by the countable union of disjoint sets Aj, where Aj C = 
{/ : h{f, fj) < (5}, and {fj} is a set of densities such that h{fj,fo) > e. We can 
take (5 < e so that h{fnAj, /o) > 7 > 0, where 'y = e — 6. This follows since 

HfnAj , /o) > HfjJo) - KfnAj , fj) 

and h{fnAj,fj) < S- 

Theorem 4. IfU has the Kullback-Leibler property and 

E Vn(Hj) < oo, 


a.s. 


Proof. Now 


If 


n”(H) = En”(Aj) < E Vn%4“ 


= EJXi,^-(/)n(d/)//„ 


Anj = R„(/)n(d/), 
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then 

^n+lj — ^nj \J /nAj (^n+l)//o(^n+l)! 

see Section 2, equation (2), which includes the case when n = 0 and Aoj = 
\/rTp^. So, 

E(A„+ij|^„) = A„j{l - h{fnAj,fo)} < (1 -7)A„j 
and, hence, E(A„j) < (1 — Therefore, 

pr|^A,^j > exp(—n(i)| < exp(n(i)(l — 7)”'^^ Vn(Aj) 


and so if 


^\/n(Aj) < oo. 


then 


< exp(—nd) 


a.s. 


for all large n, for any d< — log(l — 7 ). The Kullback-Leibler property for IT 
ensures that In > exp(—nc) a.s. for all large n, for any c > 0. This completes 
the proof. □ 

Clearly, if the prior IT puts mass on the density fk for k = 1,2,..., 
then the required condition is simply 

< 00 , 


which is straightforward to arrange in practice. 

The result of Theorem 4 can be applied to specific priors with good results. 
See next in Section 6 . However, it does somewhat lack interpretation as can 
be seen by the need to go from J2j {Aj ) to \/n"(Aj). On the other 
hand, the appearance of square roots should not be a great surprise when 
dealing with the Hellinger distance. 

6. Illustrations. Here we consider some examples (6.1 to 6.3) illustrating 
Theorem 4. We have H being covered by {Ai, A 2 ,...}, which are mutually 
disjoint Hellinger balls of radius 5. The aim then is to show that 




00 


and that this holds for all <5 > 0. Also, Section 6.4 will illustrate Theorem 3. 
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6.1. Infinite-dimensional exponential families. Here we consider the case 
when / is constructed from an infinite sequence of random variables, 0 i, 02 ■ • • • 
The prior on the {9j} makes them independent and we assume that 9j ~ 
N(0, cTj). A (5-covering of will be the union of sets of the type 

{9 : UjSj < 9j < {uj + 1)6j,j = 1,2,...} 

for a sequence 5j = S'jj, where the { 7 ^} do not depend on 6. Here the nj are 
integers and can be between —00 and + 00 . It is convenient to define 

-^jn — (.^6j ^ (jl “t“ l)(5j). 

We are then interested in the hniteness of 

00 00 M 

Y. ••• E 

ni=—00 nM=—^j=l 

as M ^ 00 , which, because of symmetry, holds if 

00 00 

HE \/pr( 0 j G Ajn) < 00 . 

j=ln=0 

Dropping the subscript j temporarily, we have 

00 00 

\/pr (0 E An) < 1 + E e A) 

71=0 71=1 

00 

< 1 + (27r)“^/'^((5/fT)^'^^ Y, exp{—(5^n^/(4iT^)} 

71=1 

< 1 + (27r)“^/^(5/(T)^/^[exp{(5^/(dcr^)} — 1]“^ 

< 1 + 4:^m\{27r)-^/^{al6)^^-^/^ 

for any m = 1,2,.... The last inequality follows from 

m = l,2,..., 

for all ^ > 0. The required condition on the {crj} is then that 

00 

1 = 1 

for all ijj > 0. This is achieved if 

Hi", hif 

3 


< 00 . 
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To make this example specific, consider the infinite-dimensional exponential 
family on [0,1] for which 

{ OO 

J2Gj4>jix) - c(0) 

j=0 

where the {4>j} are an orthonormal basis on [0,1] and c(0) ensures that / 
integrates to 1. Such an orthonormal basis is given by 

4>o{x) = l and (j)j{x) = V2cos{j7rx) for j > 1. 

To ensure that / is a density with probability 1, it is sufficient that J2j <oo. 
Then, according to Barron, Schervish and Wasserman (1999), an additional 
condition sufficient for Bellinger consistency is that J2j3^j < oo. So it is 
possible to have aj oc for any r > 0. 

For our condition, we require 

3 

for some sequence {tOj} satisfying < oo. This follows because we can 

take 6j = S*iOj/J2j so that if \6ij — 02 j| < Sj, then 


sup 

0<3;<1 


J^Oij^jix) -J2^2j(pj{x) 

3 3 


<5*V2, 


which implies that /i(/i, / 2 ) < (^ = 1 — exp(—5*\/2). If we put ujj oc j ^ ^ for 
any r > 0, then 

3 

is sufficient. Therefore, we can actually have aj oc j~^~^ for any g > 0, by 
choosing m large enough. This then is seen to be an improvement on the 
condition provided by Barron, Schervish and Wasserman (1999). 

See also Walker and Hjort (2001) who have essentially the same result of 
oc j~^ as being sufficient for Bellinger consistency when combined with 
the Kullback-Leibler property for B. 


6.2. Polya trees. Bere we consider a Polya tree prior on [0,1] with parti¬ 
tion the dyadic intervals. Denote the sets at level k by B^j for j = 1,..., 2^. 
Over these sets we have independent variables Oj-j ~ be(afc,Ofc) for odd j 
and 9kj+i = 1 — again for odd j. If ^ < oo, then a random density 

function, with respect to the Lebesgue measure on [0,1], from the prior can 
be obtained via 
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where kj{x) describes the interval at level k in which x lies; that is, x £ 
— 1/2 

Bkj{x) - If < oo, then the prior puts positive mass on all Kullback- 

Leibler neighborhoods of densities g for which / g\ogg < oo. See, for ex¬ 
ample, Barron, Schervish and Wasserman (1999). However, according to 
Barron, Schervish and Wasserman (1999), the best sufficient condition for 
Hellinger consistency is = 8^ which is not a “nice” set-up and would im¬ 
pede those statisticians looking to incorporate relevant information. We im¬ 
prove on this using the sums of square-roots of prior probabilities. 

First, we find the covering sets of H with Hellinger balls of radius 6. If 

exp(-4) < 9ikj/02kj < exp(4), 

for all j, which is equivalent to 

exp(-4) < 9ikj/02kj < exp(4) 

and 


exp(-4) < (1 - 9ikj)/{l - 92kj) < exp((5fc) 
for all odd j, and for all k and J2k ^k = , then it is easy to show that 

J \/7L^>exp(-i(5*) 

and so /i(/i, / 2 ) <<5 = 1 — exp(—(i*/2). Here, for example, 

k 

fi{x)= lim 2^Y[9^ij^^). 

Therefore, letting 9k denote a generic random variable at level k, we split [0,1], 
the range of 9k, into the sets Ako = {\ — bk,\ + hk), where 

hk = f {exp(4) - l}/{exp(4) -L 1} 

and 

^ki = (cfcexp{-Mfc},Cfcexp{-(^ - 1)4}) 

and 

^ki = (1 - Cfc exp{-(/ - 1)41,1 - Cfc exp{-/4}) 

for ^ = 1,2,..., where Ck = \— hk- Again, due to symmetry, that is, pr(0fc £ 
^ki) ~ £ -^ki)^ interested in the finiteness, as M —> oo, of 

OO OO M 

E ••• E n n \/pr(4,- G A-^^,), 

nii =0 n2JU2M_i=0fc=l j=l,3,...,2'=-l 




BAYESIAN CONSISTENCY 


13 


with the convention that which is equivalent to the finiteness of 

OO I' CXD 

Hi G Ako) + '^\/w{Gk e Aj,i) 

k=l I 1=1 

The difference between this and the corresponding expression for the infinite¬ 
dimensional families is the power 2^“^, which is present due to the 
independent variables at level k. Now 

pr(0fc e Al^) = (1 - dx 

< 2^“'““^Y^afe/7rcfcexp(-/4){exp((5fc) - 
where = Cfeexp{ —(/ — 1)5 a:}- Here we have used the inequality 

r(2a)/r(a)2 <22“-iy^, 

see Barron, Schervish and Wasserman (1999), and that if x < ^ < ^, then 
a:(l — x) < ^(1 — Now — ^ki) < 1/4 — for all I and so 

v/pr(6'fc E < V’2“''"^4'^'^exp(-lWfc)\/exp(4) - 1(1 - 62^“fc/2-i/2 

for some hxed i/’ > 0. Here we need only consider k large enough for which 
Ofc > 1. Therefore, 

v/pr(«.eA„) + f ^pr(fl»eA,-,)< l + 



We are then looking for 


^2^-1 

k 


1/4 Vexp(4) - 1 
exp(4/2) - 1 


exp(—2afc6|) < oo. 


Now we can take dkock ^ ^ for any r > 0 and for large k, 


\/exp(4) - 1 ^ .-1/2 ^ . l/2+r/2 

exp(4/2) - 1 ^ 

Also, for large k, bk ~ 4 and so ak oc k^^'^ for any g > 0 is sufficient. 


6.3. Mixture of priors. A popular type of nonparametric prior consists 
of a mixture of priors, 

H = y/pArHjv, 

N 

where J2nPn = 1 and the {pn} are fixed. Here Htv is supported by densities 
in Cat C fl, so that nAr(C' 7 v) = 1- Typically, Cn is totally bounded, that 
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is, Cm can be covered by a finite number of Hellinger balls of size 6, for 
any 5 > 0. The number of such balls will be denoted by More often 

than not, Cm C Cm+i and Cm converges to fl. See Petrone and Wasserman 
(2002), for example, who consider random densities generated via Bernstein 
polynomials. 

Following the above specifications, if covers hi, then we assume 

that Cm is covered by {^i,..., Ajj^} and, therefore, njv(^A:) = 0 for A: > Im- 
Hence, 

^ / X! PN- 

k fc y ^N^k 

Consequently, if 

^ VP{Mk) < oo, 
k 

where P{Mk) = J2M>Mk = min{A^ Mjv > fe}, then Hellinger con¬ 

sistency holds. 

For example, if Im{^) = (c/^)^, for some c > 0 not depending on <5, as is 
the case with Bernstein polynomials, then 

Mk{6) = LlogA:/log(c/<5)J 

and, hence, we would wish that 

P{Mk{5))<ak-^-\ 

for some r > 0 and a > 0, for all large k and for all 6 >0. This holds if P{N) < 
aex.p{—N'ip) for all > 0 for all large N, which holds if N~^ logP{N) —oo 
as N ^ oo. 

6.4. Random histogram. Here we consider a random histogram model 
on [0,1] to illustrate Theorem 3. We take m G {1,2,...} with probabil¬ 
ity 7r(m) and construct the random density function 

m 

fm{x) = '^km^iOtk—lm X akm)^ 
k=l 

where Wkm > 0, J2T=i^km = "rn and akm = k/m, k = 0,1,... ,m. We will 
write Akm = (ak-im, akm)- We put pkm = Wkmim and have a Dirichlet(l... 1) 
prior for pm = {pi m? • • • 5 Pmm)- Then 

OO 

fnix) = fnm{x)Tr{m\X^), 
m=l 
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where 

m 

fnm{x) — 'y ^ ^ X <i Chkm) 

k=l 


and Wkmn = ^{wkm\X'^)- Also, for a nonnegative variable Z, 1/EZ < El/Z, 
so 

/ /oV/n<E^H^") / /oV/nm 

and, therefore, 

Ex- / [ fo/fnm- 

J rn 

Now Wkmn = + nkm)l{m + n), where Wkm = Ya=i H^i E Akm), and so 

/ m « 

fi/fnm = Yl / E[(m + n)/{m(l+ nfcm)}]/o. 

k=l 


It is easy to show that 

E{1/(1 + nkm)} < 1/{(1 + n)FQ{Akm)} 

with nim---nmm^Txmlt{n;Fo{Aim)...Fo{Amm)), and so 
r _i_ m « 

Ex- //„V/« < E g 

< A > --vr(m), 

“ ^ 1 + n ^ ^ 

m 

where A = sup^. /o(t), which we will assume to be finite. Therefore, 

sup|ex- J /o//n| < oo 

when J2m '>xnT(m) < oo and so if the prior puts positive mass on all Kullback- 
Leibler neighborhoods of /o, then N~^ J2n=i D{fn-i, fo) 0 a.s. 


7. Discussion. As far as Hellinger consistency is concerned, the most 
fruitful sufficient conditions to date appear to be those involving the finite¬ 
ness of sums of square roots of prior probabilities. Indeed, they improve on 
current sufficient conditions which have been published in the literature. 
In the case of Polya trees the improvements are quite dramatic. 

A framework for Kullback-Leibler consistency, which fits into a general 
framework including weak and Hellinger consistency, has been developed 
using martingales. Theorems 2, 2* and 3 suggest that the condition 

Var{log(L„/L„_i)} < oo 

n 
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is highly significant and conditions under which this holds need to be un¬ 
derstood. 

Future work will investigate rates of convergence using the sums of square 
roots of prior probabilities approach. The basis for this is consideration 
of n"(An), where An = {f : h{f, /o) > Sn} and Sn i 0. Following the proof of 
Theorem 4, we have 

< exp(-n(in)/\/4 a.s. 
for all large n, for any sequence dn satisfying 

^exp{-n( 7 n - dn)}Kn < oo, 

n 

where 

Kn = YlV'n{Anj) 

j 

and {Anj} covers An with 6n size Hellinger balls and 'jn = £n — Sn- Putting 
dn = £n/2 so 7 n = en/2 seems appropriate here. Then, for example, lower 
bounds for In in an a.s. sense are available from Shen and Wasserman (2001), 
using the Paifjo) = a"V{(/o//)" - l}/o metric, for 0 < a < 1. To find 
rates, it is required to understand Kn which will be prior specific and involve 
a refinement of the work appearing in Section 6. 
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